When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration

نویسندگان

  • Yu-Ting Chen
  • Jason Cong
  • Zhenman Fang
  • Jie Lei
  • Peng Wei
چکیده

Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei and Peng Wei {ytchen, cong, zhenman, jielei and peng.wei.prc}@cs.ucla.edu University of California, Los Angeles Abstract FPGA-enabled datacenters have shown great potential for providing performance and energy efficiency improvement. In this paper we aim to answer one key question: how can we efficiently integrate FPGAs into stateof-the-art big-data computing frameworks like Apache Spark? To provide a generalized methodology and insights for efficient integration, we conduct an indepth analysis of challenges at single-thread, single-node multi-thread, and multi-node levels, and propose solutions including batch processing and the FPGA-as-aService framework to address them. With a step-by-step case study for the next-generation DNA sequencing application, we demonstrate how a straightforward integration with 1,000x slowdown can be tuned into an efficient integration with 2.6x overall system speedup and 2.4x energy efficiency improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Need and Role of Scala Implementations in Bioinformatics

Next Generation Sequencing has resulted in the generation of large number of omics data at a faster speed that was not possible before. This data is only useful if it can be stored and analyzed at the same speed. Big Data platforms and tools like Apache Hadoop and Spark has solved this problem. However, most of the algorithms used in bioinformatics for Pairwise alignment, Multiple Alignment and...

متن کامل

Cloudflow - enabling faster biomedical pipelines with MapReduce and Spark

For many years Apache Hadoop has been used as a synonym for processing data in the MapReduce fashion. However, due to the complexity of developing MapReduce applications, adoption of this paradigm in genetics has been limited. To alleviate some of the issues, we have previously developed Cloudflow a high-level pipeline framework that allows users to create sophisticated biomedical pipelines usi...

متن کامل

Next Generation Sequencing and its Application in the Study of Microbiome in Plant Diseases Suppressive Soils

Progress in next-generation sequencing has played a significant role in ecological studies of microbial populations. These advances have led to a rapid evaluation in metagenomics studies (analysis of DNA of microbial communities without the need to culture). Many statistical and computational tools and metagenomics databases have led to the discovery of huge amounts of data. In this research, i...

متن کامل

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

UNLABELLED Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to ...

متن کامل

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016